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Whether explicit or implicit, sets are a critical part of many pieces of 
software. As a result, it is necessary to develop abstractions of sets for the 
purposes of abstract interpretation, model checking, and deductive verifica¬ 
tion. However, the construction of effective abstractions for sets is challenging 
because they are a higher-order construct. It is necessary to reason about 
contents of sets as well as relationships between sets. This paper presents a 
new abstraction for sets that is based on binary decision diagrams. It is op¬ 
timized for precisely and efficiently representing relations between sets while 
still providing limited support for content reasoning. 


1 Introduction 

In deductive software verification, it is common to want to verify programs that manip¬ 
ulate sets in some way. In some cases, this is because sets are manipulated explicitly. 
For example most languages, including Python, C+-and Java, provide data structure 
libraries that include sets. However, as Kuncak observed fKun07j , it is also often useful 
to use sets to represent implicit invariants of non-set data structures, such as the set 
of elements stored in a list. This is a useful invariant for verifying a list membership 
function, for instance. 

In addition to deductive verification, it is also useful to be able to automatically 
analyze programs that manipulate implicit and explicit sets. This means that invariants 
for sets need to be automatically inferred. To do this we assume the approach of abstract 
interpretation |CC77j . as it is a general approach. There are several works that have 
developed and utilized abstractions suitable for sets. QUIC graphs [CCS13} uses a 
hypergraph to represent set constra 

Automatic analysis of sets is a well studied space. There are abstractions that 
have been used for automatically analyzing Python functions that explicitly manipu¬ 
late sets [CCS13j . There are also abstractions that use sets implicitly for properties of 
other data structures. For instance HOO |CCR14] uses sets to abstract key sets for 
map-like data structures and FixBag P I'd (' 1 1 uses sets to abstract the elements of 
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a list. This allows automatic verification of modular specifications of certain kinds of 
functions. 

This paper develops a new kind of abstract domain for sets. Rather adopting a 
content-centric approach that focuses on the possible contents of each set, such as if 
set A = {1,2,5}, the abstract domain presented here adopts an as-a-whole approach, 
focusing on relationships between sets, such as A C B. By optimizing heavily for the 
as-a-whole case, we find that is useful to use different data structures than those focused 
more on contents. Specifically, we find that the use of binary decision diagrams is 
particularly efficient and useful. 

This paper describes the construction of such a binary-decision-diagram-based ab¬ 
stract domain through the following contributions: 

• We introduce a binary-decision-diagram-based abstract domain for set-manipulating 
programs. This domain supports common set operations for fully-relational as- 
a-whole reasoning. It utilizes the canonical, reduced representation of reduced, 
ordered binary decision diagrams to more efficiently abstract program states in¬ 
volving sets than existing abstractions for sets. (Section [3]) 

• We use a novel encoding that conflates both logical operations with set operations 
into a single binary decision diagram without loss of precision. This encoding 
efficiently translates set operations into binary decision diagrams. (Section |3.1[ ) 

• We provide a reduction with a value domain to augment the as-a-whole capable 
BDD-based set domain with content-centric value reasoning. (Section [4]) 


2 Preliminaries 

In this section we will give necessary background for boolean algebras and binary decision 
diagrams. We will use the fact that set constraints form a boolean algebra to create an 
effective, efficient set abstraction in Section [3j 


2.1 Sets as Boolean Algebras 

A Boolean algebra is bounded lattice consisting of a top element 1 and a bottom element 
0. There are three operations in a Boolean algebra: (1) meet A, which computes the 
greatest lower bound of two elements in the lattice, (2) join V, which computes the least 
upper bound of two elements in the lattice, and (3) complement -i, which relates one 
lattice element to another. A Boolean algebra has the following properties for lattice 
elements a, b, and c: 


a V (6 V c) = (a V b) V c 
a V b = b V a 
a V 0 = a 

a V (b A c) = (a V b) A (a V c) 
a V -io = 1 


a A (b A c) = (a A b) A c 
a A 6 = b A a 
a A 1 = b 

a A (6 V c) = (a A b) V (a A c) 
a A -ia = 0 


associativity 

commutativity 

identity 

distributivity 

complements 
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The language of sets is also a Boolean algebra. The universal set U is the top element. 
The empty set 0 is the bottom element. The intersection operation n is the meet 
operation. The union operation U is the join operation. Finally, the set complement 
operation c is the complement operation. 

2.2 Binary Decision Diagrams 

Binary decision diagrams (BBDs) canonically and efficiently represent a Boolean algebra. 
They are based on the if-then-else (ITE) normal form: 

Definition 1 (if-then-else normal form). If-then-else normal form represents a Boolean 
algebra with the following syntactic structure: 

B ::= ite (v,B t ,B e )) 
true 
false 

Additionally, if a term ite(u, B, B)) occurs (both B t and B e are the same), it is replaced 
with B. 

The semantics of ITE normal form are defined under an assignment. An assignment 
/ maps each variable v to a either the top element 1 or the bottom element 0. We use 
the notation v H > 1 £ I to say that under the assignment I, the variable v has the value 
1. We use the judgment I h B | r to say that under the assignment /, the formula B 
evaluates to the value r, where r is either 1 or 0. The semantics follow: 


I-ITE-T 

u i->- 1 G 7 I \~ B t i) r 
I b it e(v,B t ,B e )) K r 

I-True 


I b true Jj. 1 


I-ITE-F 

/ b B e 4 r 

I b it e(v,B t ,B e )) jj. r 
I-False 


/ b false Jj. 0 


A formula expressed in if-then-else normal form is also a binary decision diagram. 
However, the most commonly used form of binary decision diagrams is assumed to be 
reduced and ordered. 


Definition 2 (Reduced Ordered Binary Decision Diagram). A reduced ordered binary 
decision diagram (referred to as a BDD) is a formula written in if-then-else normal form 
with the following two additional restrictions: 


1. There is a total order -< among variables v. If v\ V 2 and both variables occur in 

the formula, then in the evaluation v\ must be used before v 2 . 

2. Sharing of formidas is mandated, so that if the same formula B occurs more than 
once in the same formula, it is shared. 
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The semantics of BDDs is the same as for if-then-else normal form. 

These restrictions give BDDs canonicity and efficiency. For a given ordering, there 
is only one BDD that represents a given formula. Additionally, because of the sharing 
mandate, operations that are applied over a whole BDD often need only be applied once 
to each physically unique formula and thus sharing reduces work for many algorithms. 

In addition to canonicity and efficiency, BDDs support quantifier elimination. Both 
exponential and universal quantifiers can be eliminated from formulas with reasonable 
efficiency. This is why BDDs are often preferred for solving QBF problems |PV03l 
iBenO-1 . 

BDDs are represented as a directed, acyclic graph where each vertex is an ite() func¬ 
tion. The vertex is labeled with the variable that is being used in the if-then-else con¬ 
dition. The two outgoing edges represent the else case on the left with a solid line and 
the then case on the right with a dashed line. A false is represented with X and a true 
is represented with T inside a vertex with no outgoing edges. 

Example 1 (BDDs representation of logic). Consider the following formula /: 

f = v i A - 1^3 V —iU 2 A -iU 3 

Representing this formula in if-then-else normal form gives the following structure, 
assuming the ordering v\ -< V 2 A U 3 was chosen. 

ite(ui, ite(t> 3 , false, true)), ite(v 2 , false, ite(v 3 , false, true)))))) 

The BDD of the same formula is shown in Figure [lj It is the same as the if-then-else 
normal form except that it exploits sharing. Note that ite(u 3 , false, true)) occurs twice in 
the formula and the equivalent node V 3 only occurs once in the BDD. The two incoming 
arrows to V 3 indicate that sharing has improved the efficiency of the representation. 



Figure 1: BDD representation of the formula v\ A - 1 U 3 V —>V 2 A - 1 U 3 

Not all is rosy for BDDs, however. They are limited by the total ordering on the 
variables. One ordering may be exponentially more efficient than another. This means 
that efficient use of BDDs requires a good ordering. Fortunately, achieving perfect 
efficiency is not required and there are many algorithms for idetifiying good orderings. 
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2.3 Sets as Binary Decision Diagrams 

Since BDDs are well suited for representing Boolean algebras and sets are Boolean 
algebras, a BDD can be used as a set abstraction. However, in practice, this is often too 
coarse of an abstraction. Set operations often utilize the values stored within the sets. 
For example, selecting an element from a set or computing the comprehension of a set. 
These kind of operations are not trivially represented by a BDD as it only is suitable 
for reasoning about sets as-a-whole. Any reasoning about contents is lost. 

In the remainder of this paper we present two things: (1) we present the basic as- 
a-whole abstraction of sets using BDDs and give a couple of examples of where this 
type of abstraction may be useful; and ( 2 ) we present extensions to the basic as-a-whole 
abstraction to support some amount of content reasoning. Specifically, we focus on a 
prefix string abstraction to reason about contents of sets that are strings prefixed by 
string constants. 

3 Abstracting Set Constraints with BDDs 

In this section, we present an abstraction for symbolic, as-a-whole sets. Symbolic, as- 
a-whole sets abstract away the individual constituents of sets and focus entirely upon 
the relationships between sets. For example, symbolic, as-a-whole sets would be able 
to precisely abstract constraint A C B, but would not be able to precisely abstract the 
constraint A = 1, 2, 5. 

In this section, we will use the variables A, B , and C to represent set variables. These 
symbols are members of the VARS set. For examples, we will assume that the sets we 
are abstracting contain integers Z, but for the formalization we do not define Vals as 
the sets can contain any non-empty type of values. 

The concrete program state that we will be abstracting is a valuation, which assigns 
the set value to each set symbol. A valuation r/ is a member of this concrete program 
state Conc, which is defined as such: 

r i e Conc = Vars -a- V (Vals) 

This means that in each r/ each variable v E Vars maps to a set of values. 

Definition 3 (BDD-based, symbolic, as-a-whole set abstraction). A BDD-based, sym¬ 
bolic, as-a-whole set abstraction is a binary decision diagram with the normal syntax: 

BDD 3 B ::= it e(v,B t ,B e )) 
true 
false 

Additionally, it has the orderedness and compactness restrictions given in Definition ^ 

The concretization of a BDD into a valuation is given in two parts. First we define the 
concretization 7 , selectively validates elements returned by the second part. The second 
part 7 s constructs a concretization of the BDD augmented with a validation set S. 


5 


The functions 7 and 75 have the following types. 

7 : BDD -> T (Conc) 

75 : BDD —> V (Conc x P (Vals)) 

The 7 function takes a BDD and returns a set of valuation functions. To do this, it calls 
the 75 function, which returns a set of candidate valuations paired with a validation 
set. If the valuation set is equal to the universe, that is the set of all values Vals, that 
valuation function has been validated and can be included in the concretization. The 
definition of 7 follows. 


l{B) = {v\ iv, Vals) <g 7 s(B) } 


The construction of the valuation set follows from the Boolean algebra that BDDs are 
constructed from. We can see this construction in the definition of 75 . 


7 sifalse) = { (77, 0 ) | 77 6 Conc } 
7 s(fnte) = { (77, Vals) | 77 6 Conc } 


7 s(ite(v, B t ,B e ))) 


(■ n,s) 


(77, St) e 'fs{B t ) A (77, S e ) € 7s(- B e) \ 
A 5 = (??(t;) c U 5 t ) n (77(7;) U 5 e ) j 


The 75 function is defined in three parts. One for each syntactic class of a BDD. 
Note that the additional orderedness and compactness restrictions do not affect the 
concretization, only the efficiency and canonicity of the representation. The first two 
classes reveal the nature of the validation set. The application of 75 to false gives the 
set of all possible valuations paired with the validation set 0. Since the 0 validation set is 
never equal to Vals, none of these valuations will be validated. The converse is the true 
case, where Vals is by definition equal to Vals, so all possible valuations are validated. 

The third syntactic class, which handles the ite() function, performs the conditional 
operation on the validation set. It computes the validation sets for both the then (St) 
branch and the else (S e ) branch. The new validation set can be computed by looking 
up the set for the current variable v in the valuation. 

The reason for this formula comes from the correspondence between set algebra and 
Boolean algebra. The operation it e(v, Bt, B e )) has the following definition in Boolean 
algebra: 


v -» Bt A ~<v —> B e 


Assuming the correspondence given between A and n, V and U, and -1 and c , the formula 
for the computation of the new validation set follows directly. 


3.1 Domain Operations 

The domain operations for this set domain are derived directly from the equivalent 
BDD operations. Typical BDD implementations provide at least the basic A, V, and -1 
operations, along with universal and existential quantification. The domain operations 
can be derived from those. 
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Constructing Expressions Because of the lack of support for content reasoning, set 
expressions are limited to the following: 

E ::= 0 | VALS \ A\ EUE\EnE\E*SE\E\E\E c 


This language incorporates all of the symbolic expressions for sets, including union, 
intersection, disjoint union, set difference and set complement. 

To construct the binary decision diagrams that correspond to these expressions, we use 
a translation function trgQ that converts a set expression into a pair of binary decision 
diagrams. This function is shown in Figure [2j The first resulting BDD of this function 
is the translated expression and the second resulting BDD represents side constraints 
on that expression. These side constraints are necessary to translate the disjoint union 
expression, which has a side constraint that the sets being unioned are disjoint. All other 
operations simply pass along the constraints conjoining them in the BDD. 


trg(0) = (false, true) 
tr£i(VALS) = (true, true ) 
tr e(A) = (A, true) 

^e(E\ U E2) = 

let (ei,ci) = tr K (£j) in 
let (e-2, c 2 ) = tr e (E 2 ) in 
(ei V e 2 ,ci A c 2 ) 

tv E (Ei n E 2 ) = 

let (ei,ci) = tr e (E 1 ) in 
let (e 2 ,c 2 ) = tr e (E 2 ) in 
(ei A e 2 ,ci A c 2 ) 


tr E (E\ tt) E 2 ) = 

let (ei,ci) = tr E (Ei) in 
let (e 2 ,c 2 ) = tr E (E 2 ) in 
(ei V e 2 , ci A c 2 A ->(ei A e 2 )) 
tr E (Ei \ E 2 ) = 

let (ei,ci) = tT E (E 1 ) in 
let (e 2 ,c 2 ) = tr e (E 2 ) in 
(ei A -ie 2 ,ci A c 2 ) 
tr E (E c ) = 

let (e,c) = tr e (E) in 

(~' e , c ) 


Figure 2: The translation function tr£;() that converts a set expression into a pair of 
BDDs. The first BDD represents the expression and the second BDD repre¬ 
sents side constraints on that expression. 

Figure [2] shows that the translation is the literal replacement of set operations with the 
corresponding BDD operation. Of course, operations that deal with individual values 
will have to be abstracted into this language. This means that expressions like singleton 
sets (for example {1}) have to be abstracted by a set symbol (for example A). Other 
operations such as comprehensions (e.g. { x E A \ p(x) }) can be abstracted by intro¬ 
ducing a symbol and constraining that symbol (e.g. B in the expression with the side 
constraint B C A). The exact form of this abstraction is unspecified here. The mere 
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requirement is that the set expression be translated into this language to that it can be 
precisely translated via tr#() to its BDD equivalent. 

Constructing Constraints Constructing constraints from expressions requires a similar 
language restriction: 

K ::= true \ false \ E C E \ E = E \ K A K \ K V K 

The language supports several commonly used set constraints including (non-strict) sub¬ 
set constraints between two set expressions and equality between two set expressions. 
Additionally, it supports several standard Boolean combinators including conjunction 
and (somewhat uniquely for an abstract domain) disjunction. Other operations are un¬ 
supported because they cannot be implemented solely with BDDs using the mechanism 
presented here. 

The translation of constraints into BDDs does two things. First, it actually constructs 
the constraints out of the constituent expressions or constraints. Second, it merges 
the side-constraint BDDs that are produced by the translation of expressions into the 
constraints so that there is only a single BDD as a result. The resulting BDD compactly 
represents the set expressions and set constraints together. 

The definition of the translation function tr^() is shown in Figure [3j It translate 
Boolean constraints directly into their BDD counterparts. The subset constraint uses a 
Boolean implication (-iei V e 2 ) to merge the two expression BDDs. The side constraints 
are conjoined to the constraint that utilized the expressions producing those constraints. 
The equality constraint is similar to the subset constraint except that it uses a bi¬ 
implication instead of the single implication. 


tr K^true) = true tr^-Ki A AT 2 ) = tr^(iFi) A tr^-(A" 2 ) 

tr [((false) = false trx(K\ V K 2 ) = tr^(iFi) V trx{K 2 ) 

tr^(Ai C E 2 ) = tr k(E\ = E 2 ) = 

let (ei,ci) = trE(Ei) in let (ei,ci) =trg(.Ei) in 

let (e 2 ,c 2 ) = tr, e{E 2 ) in let (e 2 ,c 2 ) = tr E (E 2 ) in 

(-■ei V e 2 ) A ci A c 2 (~^e\ V e 2 ) A (-ie 2 V ei) A q A c 2 

Figure 3: The translation function tr^() that converts set constraints into a BDD 

Using these constraint forms, it is possible to implement the abstraction and/or the 
constrain domain operations. These are useful for defining transfer functions for the 
program. 

Join and Widening The join and widening operations are trivial. Since the constraint 
language supports disjunction, the disjunction of the two BDDs gives a precise join: 

B\ U B 2 = B\ V B 2 
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Critically, because this is represented using a Boolean algebra, the resulting lattice is 
finite height (for a fixed set of variables Vars). This means that the join is also a suitable 
widening. It may not be an optimal widening as the lattice has an exponential height 
and may take a long time to converge. This is the challenge in BDD-based forward 
reachability and suggests that there may be ways of improving this widening operator 
using model checking techniques. 

Containment The containment operation is also trivial. It relies upon the implication 
ordering in the Boolean algebra lattice: 


B\ C B 2 = B\ ► B-2 

The implementation of this is not quite as trivial, however. This is because there is 
implicit universal quantification in the above formula. Implication must be valid. As a 
result it is possible that, because BDDs support universal quantification, the following 
could be implemented: 


Vu.-i.Bi V B2 

where Vu. universally quantifies over each variable v in Vars. However, it is much more 
efficient to check the satisfiability of the negation: 

SAT(Bi A -.V2) 

If the formula is unsatisfiable, its negation must be valid. 

Projection Projecting out variables is the primary reason BDDs are preferable for 
this application over SAT solvers. BDDs support existential quantification directly and 
consequently it can be used to implement projection. For example to project out the 
variable v from the set domain instance B, the following BDD operation can be used: 

3v.B 


4 Set Contents 

The abstraction presented in Section [3] does not permit reasoning about any contents 
of sets. It is strictly an as-a-whole abstraction. However, it can be adapted for varying 
amounts of content reasoning by implementing query-based reductions. Query-based 
reductions utilize an external domain for keeping track of possible contents of sets and 
then use queries on the BDD to drive reductions in that domain. 

(This section will be fleshed out in a future version) 
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5 Related Work 


There exist two set abstract domains that focus on as-a-whole reasoning: QUIC graphs 
[ CCS13| and FixBag I HTC11 . Both of these systems do not use a canonical representa¬ 
tion like BDDs. They use a proof system along with a heuristic-guided saturation/proof 
search. This methodology lends itself to efficiently keeping track of content information 
about sets, but it is much less ideal for efficiently doing as-a-whole reasoning. While 
both the BDD-based approach and the QUIC graphs/FixBag approach have exponen¬ 
tial worst case, the QUIC graph and FixBag approach often encounter that worst case 
because the saturation technique attempts to enumerate all 0(2 n ) possibilities. The 
BDD-based approach often does not encounter this problem because it uses the ordering 
and the sharing to often eliminate the exponential cost. 

There is no good comparison for the BDD-based set domain extended with content 
reasoning with QUIC graph/FixBag. The reason is that BDD-based reasoning is heav¬ 
ily focused on precisely performing as-a-whole reasoning, while sacrificing precision (or 
slowing down) in the content reasoning. Conversely, QUIC graphs/FixBag sacrifice as-a- 
whole precision and performance to get better content reasoning. Different applications 
may have different needs. 

The use of BDDs for model-checking-style verification is well documented |Clafl8j . 
BDDs were used to help solve the state explosion problem by symbolically representing 
many states implicitly |BCM + 90llMcM92j . Of course, the complexity of these approaches 
compares with the complexity of abstract interpretation using BDDs to represent sets. 
This is because if sets are used extensively, the set structure will end up capturing much 
of the control flow. This results in the BDD needing to solve similar problems to model 
checking, which implicitly represents the control flow in the logic along with the data. 
Of course the use of BDDs for symbolic model checking does not take advantage of the 
fact that they can represent things other than true or false values. 

The use of BDDs for abstract domains is more recent. They have been primarily used 
for logico-numeric abstraction in BddApron |Jea09j . BddApron combines the Apron 
numeric abstract domain library |JM09| with BDDs to efficiently support disjunction. 
The idea is to the the BDD to represent the control flow, but to use the Apron domains 
for numeric reasoning. Aside from the fact that it is intended to capture control flow, this 
is similar to what BDD-based sets does with reductions. The set abstraction restricts 
another abstraction that reasons about values. 


6 Conclusion 

This paper describes a new method for abstracting states of set-manipulating programs. 
The domain is built upon the logical foundation of binary decision diagrams and utilizes 
the fact that binary decision diagrams can represent any Boolean logic, not just the 
standard 0—1 logic. By using the well-engineered, well-designed data structures for 
BDDs, we have found that when as-a-whole reasoning is needed, it is generally more 
efficient and more precise to use BDDs than other proof-based methods. 
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Of course, the sacrifice that was made was in terms of content reasoning. The BDDs 
do not provide a native way for reasoning about the specific contents of sets. This can be 
remedied somewhat by using a value abstraction and performing query-based reductions 
with the BDDs. However this space remains to be explored more thoroughly. The query- 
based approach introduces significant run-time overhead if applied thoroughly. Either 
good heuristics should be employed or a new hybrid approach should be developed. 
What works best remains to be seen. 
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